Deep Learning Lab

In this notebook, we use deep learning architectures for regression analysis and classification analysis. We begin with installing/ importing all the necessary packages. Then, we use a deep learning model for a regression task. Finally, we use a deep learning model for a classification task.

In [39]:
# # Install packages:

# !pip install pandas
# !pip install tensorflow
# !pip install keras
# !pip install seaborn
# !pip install category_encoders
In [40]:
# Import packages:

import sys
import pandas as pd
pd.set_option('display.max_rows', 50)
import numpy as np
np.set_printoptions(threshold=sys.maxsize)
import matplotlib.pyplot as plt
import seaborn as sns
from category_encoders import *

# sklearn imports:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import MinMaxScaler

# keras imports:
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from keras.backend import clear_session

1. Deep Learning for Regression

Each row in the dataset corresponds to a property that was inspected and given a hazard score ("Hazard"). You can think of the hazard score as a continuous number that represents the condition of the property as determined by the inspection. Some inspection hazards are major and contribute more to the total score, while some are minor and contribute less. The total score for a property is the sum of the individual hazards.

The aim of the competition is to forecast the hazard score based on anonymized variables which are available before an inspection is ordered.

Source: https://www.kaggle.com/c/liberty-mutual-group-property-inspection-prediction/data

In [41]:
df = pd.read_csv("data/insurance.csv")

df.head()
Out[41]:
Id Hazard T1_V1 T1_V2 T1_V3 T1_V4 T1_V5 T1_V6 T1_V7 T1_V8 ... T2_V6 T2_V7 T2_V8 T2_V9 T2_V10 T2_V11 T2_V12 T2_V13 T2_V14 T2_V15
0 1 1 15 3 2 N B N B B ... 2 37 1 11 6 Y N E 2 2
1 2 4 16 14 5 H B N B B ... 2 22 1 18 5 Y Y E 2 1
2 3 1 10 10 5 N K N B B ... 6 37 2 14 6 Y Y E 6 1
3 4 1 18 18 5 N K N B B ... 2 25 1 1 6 Y N C 2 6
4 5 1 13 19 5 N H N B B ... 1 22 1 2 7 N N E 1 1

5 rows × 34 columns

In [42]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50999 entries, 0 to 50998
Data columns (total 34 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Id      50999 non-null  int64 
 1   Hazard  50999 non-null  int64 
 2   T1_V1   50999 non-null  int64 
 3   T1_V2   50999 non-null  int64 
 4   T1_V3   50999 non-null  int64 
 5   T1_V4   50999 non-null  object
 6   T1_V5   50999 non-null  object
 7   T1_V6   50999 non-null  object
 8   T1_V7   50999 non-null  object
 9   T1_V8   50999 non-null  object
 10  T1_V9   50999 non-null  object
 11  T1_V10  50999 non-null  int64 
 12  T1_V11  50999 non-null  object
 13  T1_V12  50999 non-null  object
 14  T1_V13  50999 non-null  int64 
 15  T1_V14  50999 non-null  int64 
 16  T1_V15  50999 non-null  object
 17  T1_V16  50999 non-null  object
 18  T1_V17  50999 non-null  object
 19  T2_V1   50999 non-null  int64 
 20  T2_V2   50999 non-null  int64 
 21  T2_V3   50999 non-null  object
 22  T2_V4   50999 non-null  int64 
 23  T2_V5   50999 non-null  object
 24  T2_V6   50999 non-null  int64 
 25  T2_V7   50999 non-null  int64 
 26  T2_V8   50999 non-null  int64 
 27  T2_V9   50999 non-null  int64 
 28  T2_V10  50999 non-null  int64 
 29  T2_V11  50999 non-null  object
 30  T2_V12  50999 non-null  object
 31  T2_V13  50999 non-null  object
 32  T2_V14  50999 non-null  int64 
 33  T2_V15  50999 non-null  int64 
dtypes: int64(18), object(16)
memory usage: 13.2+ MB
In [43]:
df.columns
Out[43]:
Index(['Id', 'Hazard', 'T1_V1', 'T1_V2', 'T1_V3', 'T1_V4', 'T1_V5', 'T1_V6',
       'T1_V7', 'T1_V8', 'T1_V9', 'T1_V10', 'T1_V11', 'T1_V12', 'T1_V13',
       'T1_V14', 'T1_V15', 'T1_V16', 'T1_V17', 'T2_V1', 'T2_V2', 'T2_V3',
       'T2_V4', 'T2_V5', 'T2_V6', 'T2_V7', 'T2_V8', 'T2_V9', 'T2_V10',
       'T2_V11', 'T2_V12', 'T2_V13', 'T2_V14', 'T2_V15'],
      dtype='object')
In [44]:
sns.pairplot(df[['Hazard','T1_V1', 'T1_V2' ]], diag_kind='kde')
Out[44]:
<seaborn.axisgrid.PairGrid at 0x1f1132a9280>
2021-03-28T13:49:48.410043 image/svg+xml Matplotlib v3.3.4, https://matplotlib.org/
In [45]:
cat_cols = list(df.select_dtypes(include=['O']).columns) # find columns of data type object (categorical variables)
In [46]:
encoder = BaseNEncoder(cols=cat_cols).fit(df)

df = encoder.transform(df)

df.head()
C:\Users\greer\AppData\Local\Programs\Python\Python38\lib\site-packages\category_encoders\utils.py:21: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
  elif pd.api.types.is_categorical(cols):
Out[46]:
Id Hazard T1_V1 T1_V2 T1_V3 T1_V4_0 T1_V4_1 T1_V4_2 T1_V4_3 T1_V5_0 ... T2_V11_0 T2_V11_1 T2_V12_0 T2_V12_1 T2_V13_0 T2_V13_1 T2_V13_2 T2_V13_3 T2_V14 T2_V15
0 1 1 15 3 2 0 0 0 1 0 ... 0 1 0 1 0 0 0 1 2 2
1 2 4 16 14 5 0 0 1 0 0 ... 0 1 1 0 0 0 0 1 2 1
2 3 1 10 10 5 0 0 0 1 0 ... 0 1 1 0 0 0 0 1 6 1
3 4 1 18 18 5 0 0 0 1 0 ... 0 1 0 1 0 0 1 0 2 6
4 5 1 13 19 5 0 0 0 1 0 ... 1 0 0 1 0 0 0 1 1 1

5 rows × 73 columns

In [47]:
df.columns
Out[47]:
Index(['Id', 'Hazard', 'T1_V1', 'T1_V2', 'T1_V3', 'T1_V4_0', 'T1_V4_1',
       'T1_V4_2', 'T1_V4_3', 'T1_V5_0', 'T1_V5_1', 'T1_V5_2', 'T1_V5_3',
       'T1_V5_4', 'T1_V6_0', 'T1_V6_1', 'T1_V7_0', 'T1_V7_1', 'T1_V7_2',
       'T1_V8_0', 'T1_V8_1', 'T1_V8_2', 'T1_V9_0', 'T1_V9_1', 'T1_V9_2',
       'T1_V9_3', 'T1_V10', 'T1_V11_0', 'T1_V11_1', 'T1_V11_2', 'T1_V11_3',
       'T1_V11_4', 'T1_V12_0', 'T1_V12_1', 'T1_V12_2', 'T1_V13', 'T1_V14',
       'T1_V15_0', 'T1_V15_1', 'T1_V15_2', 'T1_V15_3', 'T1_V16_0', 'T1_V16_1',
       'T1_V16_2', 'T1_V16_3', 'T1_V16_4', 'T1_V16_5', 'T1_V17_0', 'T1_V17_1',
       'T2_V1', 'T2_V2', 'T2_V3_0', 'T2_V3_1', 'T2_V4', 'T2_V5_0', 'T2_V5_1',
       'T2_V5_2', 'T2_V5_3', 'T2_V6', 'T2_V7', 'T2_V8', 'T2_V9', 'T2_V10',
       'T2_V11_0', 'T2_V11_1', 'T2_V12_0', 'T2_V12_1', 'T2_V13_0', 'T2_V13_1',
       'T2_V13_2', 'T2_V13_3', 'T2_V14', 'T2_V15'],
      dtype='object')
In [48]:
X = df.drop(columns = ['Id', 'Hazard']) # Drop Id and Hazard from the list of predictors
Y = df.Hazard # This is what we want to predict 

It is good practice to rescale features that use different scales and ranges.

One reason this is important is because the features are multiplied by the model weights. So the scale of the outputs and the scale of the gradients are affected by the scale of the inputs.

Although a model might converge without scaling, scaling the input data makes training much more stable.

In [49]:
scalar = MinMaxScaler()
scalar.fit(X)
X = scalar.transform(X)

After scaling our data, we can go ahead and split it to train and test subsets:

In [50]:
trainData, testData, trainLabels, testLabels = train_test_split(X, Y, test_size=.2, random_state = 1)

Now, we train the neural network. We are using the input variables ('T1_V1', 'T1_V2', 'T1_V3', ...), along with two hidden layers of 100 and 50 neurons respectively, and finally using the linear activation function to process the output.

In [51]:
# Clear the previous model:
clear_session()

model = Sequential()
model.add(Dense(71, # Number of nodes in the first layer 
                input_dim=71, # Number of inputs (predictors)
                activation='relu' # Activation function for this layer
               ))
model.add(Dense(50, activation='relu'))
model.add(Dense(1, activation='linear'))
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 71)                5112      
_________________________________________________________________
dense_1 (Dense)              (None, 50)                3600      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 51        
=================================================================
Total params: 8,763
Trainable params: 8,763
Non-trainable params: 0
_________________________________________________________________

After designing the network architecture, we can go ahead and compile it:

In [52]:
model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])

And finally, we can go ahead and fit the model using the train data:

In [53]:
history = model.fit(trainData, trainLabels, # Data to be used for fitting/ training the model
                    epochs=150, # Number times that the learning algorithm will work through the training data
                    batch_size=50, # Number of samples to be used in each iteration
                    verbose=1, # Whether to print the progress 
                    validation_split=0.2 # The portion of samples to be used for validation (different from our test data)
                   )
15.2594 - val_mse: 15.2594 - val_mae: 2.7854
Epoch 33/150
653/653 [==============================] - 1s 1ms/step - loss: 12.4451 - mse: 12.4451 - mae: 2.5620 - val_loss: 15.1475 - val_mse: 15.1475 - val_mae: 2.7607
Epoch 34/150
653/653 [==============================] - 1s 1ms/step - loss: 12.4541 - mse: 12.4541 - mae: 2.5595 - val_loss: 15.1836 - val_mse: 15.1836 - val_mae: 2.8000
Epoch 35/150
653/653 [==============================] - 1s 1ms/step - loss: 12.3614 - mse: 12.3614 - mae: 2.5700 - val_loss: 15.6819 - val_mse: 15.6819 - val_mae: 2.8300
Epoch 36/150
653/653 [==============================] - 1s 1ms/step - loss: 12.2788 - mse: 12.2788 - mae: 2.5550 - val_loss: 15.3044 - val_mse: 15.3044 - val_mae: 2.8126
Epoch 37/150
653/653 [==============================] - 1s 1ms/step - loss: 12.0330 - mse: 12.0330 - mae: 2.5368 - val_loss: 15.3015 - val_mse: 15.3015 - val_mae: 2.8219
Epoch 38/150
653/653 [==============================] - 1s 1ms/step - loss: 11.8552 - mse: 11.8552 - mae: 2.5174 - val_loss: 16.0958 - val_mse: 16.0958 - val_mae: 2.8724
Epoch 39/150
653/653 [==============================] - 1s 1ms/step - loss: 11.7868 - mse: 11.7868 - mae: 2.5217 - val_loss: 15.7708 - val_mse: 15.7708 - val_mae: 2.8639
Epoch 40/150
653/653 [==============================] - 1s 1ms/step - loss: 11.4942 - mse: 11.4942 - mae: 2.4990 - val_loss: 15.8753 - val_mse: 15.8753 - val_mae: 2.8964
Epoch 41/150
653/653 [==============================] - 1s 1ms/step - loss: 11.8621 - mse: 11.8621 - mae: 2.5114 - val_loss: 16.2086 - val_mse: 16.2086 - val_mae: 2.8778
Epoch 42/150
653/653 [==============================] - 1s 1ms/step - loss: 11.5587 - mse: 11.5587 - mae: 2.4945 - val_loss: 16.0530 - val_mse: 16.0530 - val_mae: 2.8490
Epoch 43/150
653/653 [==============================] - 1s 1ms/step - loss: 11.6200 - mse: 11.6200 - mae: 2.4894 - val_loss: 15.5875 - val_mse: 15.5875 - val_mae: 2.8561
Epoch 44/150
653/653 [==============================] - 1s 1ms/step - loss: 11.7092 - mse: 11.7092 - mae: 2.5024 - val_loss: 16.4461 - val_mse: 16.4461 - val_mae: 2.8565
Epoch 45/150
653/653 [==============================] - 1s 1ms/step - loss: 11.3005 - mse: 11.3005 - mae: 2.4738 - val_loss: 16.2526 - val_mse: 16.2526 - val_mae: 2.8795
Epoch 46/150
653/653 [==============================] - 1s 1ms/step - loss: 11.2130 - mse: 11.2130 - mae: 2.4635 - val_loss: 15.3495 - val_mse: 15.3495 - val_mae: 2.7682
Epoch 47/150
653/653 [==============================] - 1s 1ms/step - loss: 11.3395 - mse: 11.3395 - mae: 2.4858 - val_loss: 15.7476 - val_mse: 15.7476 - val_mae: 2.8076
Epoch 48/150
653/653 [==============================] - 1s 1ms/step - loss: 10.9136 - mse: 10.9136 - mae: 2.4376 - val_loss: 15.9867 - val_mse: 15.9867 - val_mae: 2.8470
Epoch 49/150
653/653 [==============================] - 1s 1ms/step - loss: 11.1828 - mse: 11.1828 - mae: 2.4715 - val_loss: 16.3116 - val_mse: 16.3116 - val_mae: 2.9497
Epoch 50/150
653/653 [==============================] - 1s 1ms/step - loss: 10.9622 - mse: 10.9622 - mae: 2.4445 - val_loss: 15.7518 - val_mse: 15.7518 - val_mae: 2.7989
Epoch 51/150
653/653 [==============================] - 1s 1ms/step - loss: 11.2511 - mse: 11.2511 - mae: 2.4679 - val_loss: 16.6015 - val_mse: 16.6015 - val_mae: 2.8537
Epoch 52/150
653/653 [==============================] - 1s 1ms/step - loss: 10.9847 - mse: 10.9847 - mae: 2.4364 - val_loss: 16.6305 - val_mse: 16.6305 - val_mae: 2.9693
Epoch 53/150
653/653 [==============================] - 1s 1ms/step - loss: 10.7975 - mse: 10.7975 - mae: 2.4245 - val_loss: 16.1907 - val_mse: 16.1907 - val_mae: 2.8018
Epoch 54/150
653/653 [==============================] - 1s 1ms/step - loss: 10.9444 - mse: 10.9444 - mae: 2.4419 - val_loss: 16.4874 - val_mse: 16.4874 - val_mae: 2.8576
Epoch 55/150
653/653 [==============================] - 1s 1ms/step - loss: 10.6455 - mse: 10.6455 - mae: 2.4061 - val_loss: 16.9427 - val_mse: 16.9427 - val_mae: 2.9998
Epoch 56/150
653/653 [==============================] - 1s 1ms/step - loss: 10.7212 - mse: 10.7212 - mae: 2.4199 - val_loss: 16.5428 - val_mse: 16.5428 - val_mae: 2.9355
Epoch 57/150
653/653 [==============================] - 1s 1ms/step - loss: 10.9090 - mse: 10.9090 - mae: 2.4362 - val_loss: 17.2635 - val_mse: 17.2635 - val_mae: 3.0122
Epoch 58/150
653/653 [==============================] - 1s 1ms/step - loss: 10.6057 - mse: 10.6057 - mae: 2.4141 - val_loss: 16.8571 - val_mse: 16.8571 - val_mae: 2.9873
Epoch 59/150
653/653 [==============================] - 1s 1ms/step - loss: 10.7327 - mse: 10.7327 - mae: 2.4160 - val_loss: 16.5285 - val_mse: 16.5285 - val_mae: 2.9094
Epoch 60/150
653/653 [==============================] - 1s 1ms/step - loss: 10.7335 - mse: 10.7335 - mae: 2.4211 - val_loss: 16.5299 - val_mse: 16.5299 - val_mae: 2.8520
Epoch 61/150
653/653 [==============================] - 1s 1ms/step - loss: 10.6042 - mse: 10.6042 - mae: 2.3953 - val_loss: 16.3129 - val_mse: 16.3129 - val_mae: 2.8478
Epoch 62/150
653/653 [==============================] - 1s 1ms/step - loss: 10.6259 - mse: 10.6259 - mae: 2.3917 - val_loss: 17.2160 - val_mse: 17.2160 - val_mae: 2.9636
Epoch 63/150
653/653 [==============================] - 1s 1ms/step - loss: 10.1592 - mse: 10.1592 - mae: 2.3662 - val_loss: 16.9502 - val_mse: 16.9502 - val_mae: 2.9079
Epoch 64/150
653/653 [==============================] - 1s 1ms/step - loss: 10.2400 - mse: 10.2400 - mae: 2.3763 - val_loss: 17.0109 - val_mse: 17.0109 - val_mae: 2.8807
Epoch 65/150
653/653 [==============================] - 1s 1ms/step - loss: 10.1085 - mse: 10.1085 - mae: 2.3614 - val_loss: 17.7820 - val_mse: 17.7820 - val_mae: 3.0172
Epoch 66/150
653/653 [==============================] - 1s 1ms/step - loss: 10.4372 - mse: 10.4372 - mae: 2.3955 - val_loss: 17.1323 - val_mse: 17.1323 - val_mae: 2.9147
Epoch 67/150
653/653 [==============================] - 1s 1ms/step - loss: 10.1000 - mse: 10.1000 - mae: 2.3616 - val_loss: 18.2715 - val_mse: 18.2715 - val_mae: 2.9727
Epoch 68/150
653/653 [==============================] - 1s 1ms/step - loss: 9.8974 - mse: 9.8974 - mae: 2.3453 - val_loss: 16.6812 - val_mse: 16.6812 - val_mae: 2.8579
Epoch 69/150
653/653 [==============================] - 1s 1ms/step - loss: 10.0576 - mse: 10.0576 - mae: 2.3547 - val_loss: 17.3863 - val_mse: 17.3863 - val_mae: 2.9603
Epoch 70/150
653/653 [==============================] - 1s 1ms/step - loss: 10.1832 - mse: 10.1832 - mae: 2.3737 - val_loss: 17.4751 - val_mse: 17.4751 - val_mae: 2.9471
Epoch 71/150
653/653 [==============================] - 1s 1ms/step - loss: 9.9199 - mse: 9.9199 - mae: 2.3475 - val_loss: 17.6452 - val_mse: 17.6452 - val_mae: 2.9318
Epoch 72/150
653/653 [==============================] - 1s 1ms/step - loss: 9.9477 - mse: 9.9477 - mae: 2.3270 - val_loss: 17.7989 - val_mse: 17.7989 - val_mae: 2.9820
Epoch 73/150
653/653 [==============================] - 1s 1ms/step - loss: 9.9060 - mse: 9.9060 - mae: 2.3359 - val_loss: 17.3511 - val_mse: 17.3511 - val_mae: 2.9576
Epoch 74/150
653/653 [==============================] - 1s 1ms/step - loss: 10.1143 - mse: 10.1143 - mae: 2.3618 - val_loss: 17.7489 - val_mse: 17.7489 - val_mae: 3.0112
Epoch 75/150
653/653 [==============================] - 1s 1ms/step - loss: 9.8226 - mse: 9.8226 - mae: 2.3259 - val_loss: 18.0090 - val_mse: 18.0090 - val_mae: 3.0124
Epoch 76/150
653/653 [==============================] - 1s 1ms/step - loss: 9.8151 - mse: 9.8151 - mae: 2.3225 - val_loss: 17.1872 - val_mse: 17.1872 - val_mae: 2.8750
Epoch 77/150
653/653 [==============================] - 1s 1ms/step - loss: 9.7334 - mse: 9.7334 - mae: 2.3126 - val_loss: 17.9322 - val_mse: 17.9322 - val_mae: 3.0066
Epoch 78/150
653/653 [==============================] - 1s 1ms/step - loss: 9.4390 - mse: 9.4390 - mae: 2.2968 - val_loss: 17.7456 - val_mse: 17.7456 - val_mae: 2.9964
Epoch 79/150
653/653 [==============================] - 1s 1ms/step - loss: 9.8003 - mse: 9.8003 - mae: 2.3246 - val_loss: 18.2498 - val_mse: 18.2498 - val_mae: 2.9760
Epoch 80/150
653/653 [==============================] - 1s 1ms/step - loss: 10.1243 - mse: 10.1243 - mae: 2.3473 - val_loss: 17.5917 - val_mse: 17.5917 - val_mae: 2.9830
Epoch 81/150
653/653 [==============================] - 1s 1ms/step - loss: 9.7205 - mse: 9.7205 - mae: 2.3228 - val_loss: 17.6694 - val_mse: 17.6694 - val_mae: 2.9629
Epoch 82/150
653/653 [==============================] - 1s 1ms/step - loss: 9.7768 - mse: 9.7768 - mae: 2.3329 - val_loss: 17.5743 - val_mse: 17.5743 - val_mae: 2.9307
Epoch 83/150
653/653 [==============================] - 1s 1ms/step - loss: 9.8143 - mse: 9.8143 - mae: 2.3251 - val_loss: 17.5479 - val_mse: 17.5479 - val_mae: 2.9251
Epoch 84/150
653/653 [==============================] - 1s 1ms/step - loss: 9.5487 - mse: 9.5487 - mae: 2.3063 - val_loss: 17.5385 - val_mse: 17.5385 - val_mae: 2.9251
Epoch 85/150
653/653 [==============================] - 1s 1ms/step - loss: 9.7635 - mse: 9.7635 - mae: 2.3121 - val_loss: 18.6394 - val_mse: 18.6394 - val_mae: 2.9940
Epoch 86/150
653/653 [==============================] - 1s 1ms/step - loss: 9.7524 - mse: 9.7524 - mae: 2.3266 - val_loss: 17.9621 - val_mse: 17.9621 - val_mae: 2.9639
Epoch 87/150
653/653 [==============================] - 1s 1ms/step - loss: 9.5058 - mse: 9.5058 - mae: 2.2863 - val_loss: 18.4059 - val_mse: 18.4059 - val_mae: 2.9546
Epoch 88/150
653/653 [==============================] - 1s 1ms/step - loss: 9.7502 - mse: 9.7502 - mae: 2.3188 - val_loss: 18.2743 - val_mse: 18.2743 - val_mae: 3.0172
Epoch 89/150
653/653 [==============================] - 1s 1ms/step - loss: 9.2655 - mse: 9.2655 - mae: 2.2743 - val_loss: 18.1215 - val_mse: 18.1215 - val_mae: 2.9907
Epoch 90/150
653/653 [==============================] - 1s 1ms/step - loss: 9.6218 - mse: 9.6218 - mae: 2.3077 - val_loss: 18.1123 - val_mse: 18.1123 - val_mae: 2.9947
Epoch 91/150
653/653 [==============================] - 1s 1ms/step - loss: 9.3644 - mse: 9.3644 - mae: 2.2886 - val_loss: 17.6867 - val_mse: 17.6867 - val_mae: 2.9469
Epoch 92/150
653/653 [==============================] - 1s 1ms/step - loss: 9.3772 - mse: 9.3772 - mae: 2.2810 - val_loss: 17.7240 - val_mse: 17.7240 - val_mae: 2.9899
Epoch 93/150
653/653 [==============================] - 1s 1ms/step - loss: 9.3369 - mse: 9.3369 - mae: 2.2841 - val_loss: 19.0883 - val_mse: 19.0883 - val_mae: 3.0340
Epoch 94/150
653/653 [==============================] - 1s 1ms/step - loss: 9.1337 - mse: 9.1337 - mae: 2.2568 - val_loss: 18.2618 - val_mse: 18.2618 - val_mae: 2.9963
Epoch 95/150
653/653 [==============================] - 1s 1ms/step - loss: 9.3538 - mse: 9.3538 - mae: 2.2913 - val_loss: 17.3821 - val_mse: 17.3821 - val_mae: 2.9097
Epoch 96/150
653/653 [==============================] - 1s 1ms/step - loss: 8.9388 - mse: 8.9388 - mae: 2.2462 - val_loss: 18.5424 - val_mse: 18.5424 - val_mae: 3.0296
Epoch 97/150
653/653 [==============================] - 1s 1ms/step - loss: 9.2622 - mse: 9.2622 - mae: 2.2699 - val_loss: 17.9118 - val_mse: 17.9118 - val_mae: 2.9794
Epoch 98/150
653/653 [==============================] - 1s 1ms/step - loss: 9.4219 - mse: 9.4219 - mae: 2.2847 - val_loss: 18.4821 - val_mse: 18.4821 - val_mae: 2.9631
Epoch 99/150
653/653 [==============================] - 1s 1ms/step - loss: 9.3077 - mse: 9.3077 - mae: 2.2663 - val_loss: 18.2749 - val_mse: 18.2749 - val_mae: 2.9798
Epoch 100/150
653/653 [==============================] - 1s 1ms/step - loss: 9.0671 - mse: 9.0671 - mae: 2.2534 - val_loss: 18.8986 - val_mse: 18.8986 - val_mae: 3.0780
Epoch 101/150
653/653 [==============================] - 1s 1ms/step - loss: 9.4914 - mse: 9.4914 - mae: 2.2821 - val_loss: 18.5770 - val_mse: 18.5770 - val_mae: 3.0422
Epoch 102/150
653/653 [==============================] - 1s 1ms/step - loss: 9.3232 - mse: 9.3232 - mae: 2.2832 - val_loss: 18.2224 - val_mse: 18.2224 - val_mae: 2.9150
Epoch 103/150
653/653 [==============================] - 1s 1ms/step - loss: 9.0852 - mse: 9.0852 - mae: 2.2477 - val_loss: 19.2588 - val_mse: 19.2588 - val_mae: 3.0875
Epoch 104/150
653/653 [==============================] - 1s 1ms/step - loss: 8.8386 - mse: 8.8386 - mae: 2.2256 - val_loss: 18.9292 - val_mse: 18.9292 - val_mae: 3.0635
Epoch 105/150
653/653 [==============================] - 1s 1ms/step - loss: 9.2274 - mse: 9.2274 - mae: 2.2680 - val_loss: 19.0644 - val_mse: 19.0644 - val_mae: 3.1153
Epoch 106/150
653/653 [==============================] - 1s 1ms/step - loss: 8.9441 - mse: 8.9441 - mae: 2.2383 - val_loss: 19.6135 - val_mse: 19.6135 - val_mae: 3.1115
Epoch 107/150
653/653 [==============================] - 1s 1ms/step - loss: 9.1500 - mse: 9.1500 - mae: 2.2618 - val_loss: 18.8953 - val_mse: 18.8953 - val_mae: 3.0234
Epoch 108/150
653/653 [==============================] - 1s 1ms/step - loss: 9.1376 - mse: 9.1376 - mae: 2.2457 - val_loss: 18.6499 - val_mse: 18.6499 - val_mae: 3.0354
Epoch 109/150
653/653 [==============================] - 1s 1ms/step - loss: 9.0907 - mse: 9.0907 - mae: 2.2488 - val_loss: 20.3490 - val_mse: 20.3490 - val_mae: 3.1413
Epoch 110/150
653/653 [==============================] - 1s 1ms/step - loss: 9.0536 - mse: 9.0536 - mae: 2.2489 - val_loss: 19.2544 - val_mse: 19.2544 - val_mae: 3.0604
Epoch 111/150
653/653 [==============================] - 1s 1ms/step - loss: 8.7311 - mse: 8.7311 - mae: 2.2199 - val_loss: 18.9462 - val_mse: 18.9462 - val_mae: 2.9770
Epoch 112/150
653/653 [==============================] - 1s 1ms/step - loss: 8.7742 - mse: 8.7742 - mae: 2.2199 - val_loss: 18.9942 - val_mse: 18.9942 - val_mae: 3.0690
Epoch 113/150
653/653 [==============================] - 1s 1ms/step - loss: 8.9356 - mse: 8.9356 - mae: 2.2308 - val_loss: 19.4517 - val_mse: 19.4517 - val_mae: 3.1437
Epoch 114/150
653/653 [==============================] - 1s 1ms/step - loss: 8.9200 - mse: 8.9200 - mae: 2.2414 - val_loss: 19.0449 - val_mse: 19.0449 - val_mae: 3.0700
Epoch 115/150
653/653 [==============================] - 1s 1ms/step - loss: 8.8346 - mse: 8.8346 - mae: 2.2170 - val_loss: 19.8168 - val_mse: 19.8168 - val_mae: 3.1095
Epoch 116/150
653/653 [==============================] - 1s 1ms/step - loss: 9.1292 - mse: 9.1292 - mae: 2.2527 - val_loss: 19.3971 - val_mse: 19.3971 - val_mae: 3.0545
Epoch 117/150
653/653 [==============================] - 1s 1ms/step - loss: 8.9367 - mse: 8.9367 - mae: 2.2349 - val_loss: 20.1376 - val_mse: 20.1376 - val_mae: 3.1623
Epoch 118/150
653/653 [==============================] - 1s 1ms/step - loss: 8.8832 - mse: 8.8832 - mae: 2.2306 - val_loss: 19.3072 - val_mse: 19.3072 - val_mae: 3.0706
Epoch 119/150
653/653 [==============================] - 1s 1ms/step - loss: 8.6935 - mse: 8.6935 - mae: 2.2050 - val_loss: 19.4108 - val_mse: 19.4108 - val_mae: 3.0478
Epoch 120/150
653/653 [==============================] - 1s 1ms/step - loss: 8.6930 - mse: 8.6930 - mae: 2.2102 - val_loss: 18.7314 - val_mse: 18.7314 - val_mae: 3.0123
Epoch 121/150
653/653 [==============================] - 1s 1ms/step - loss: 8.7502 - mse: 8.7502 - mae: 2.2281 - val_loss: 18.6906 - val_mse: 18.6906 - val_mae: 3.0163
Epoch 122/150
653/653 [==============================] - 1s 1ms/step - loss: 8.5407 - mse: 8.5407 - mae: 2.1949 - val_loss: 18.9833 - val_mse: 18.9833 - val_mae: 3.0371
Epoch 123/150
653/653 [==============================] - 1s 1ms/step - loss: 8.8790 - mse: 8.8790 - mae: 2.2253 - val_loss: 18.9066 - val_mse: 18.9066 - val_mae: 3.0055
Epoch 124/150
653/653 [==============================] - 1s 1ms/step - loss: 8.6656 - mse: 8.6656 - mae: 2.2067 - val_loss: 19.3024 - val_mse: 19.3024 - val_mae: 3.1025
Epoch 125/150
653/653 [==============================] - 1s 1ms/step - loss: 8.6920 - mse: 8.6920 - mae: 2.2164 - val_loss: 19.8367 - val_mse: 19.8367 - val_mae: 3.0859
Epoch 126/150
653/653 [==============================] - 1s 1ms/step - loss: 8.9484 - mse: 8.9484 - mae: 2.2328 - val_loss: 19.1984 - val_mse: 19.1984 - val_mae: 3.1048
Epoch 127/150
653/653 [==============================] - 1s 1ms/step - loss: 8.8170 - mse: 8.8170 - mae: 2.2217 - val_loss: 19.3707 - val_mse: 19.3707 - val_mae: 3.1309
Epoch 128/150
653/653 [==============================] - 1s 1ms/step - loss: 8.5804 - mse: 8.5804 - mae: 2.1948 - val_loss: 19.5963 - val_mse: 19.5963 - val_mae: 3.0817
Epoch 129/150
653/653 [==============================] - 1s 1ms/step - loss: 8.5176 - mse: 8.5176 - mae: 2.2008 - val_loss: 19.6974 - val_mse: 19.6974 - val_mae: 3.1062
Epoch 130/150
653/653 [==============================] - 1s 995us/step - loss: 8.5948 - mse: 8.5948 - mae: 2.2090 - val_loss: 20.6506 - val_mse: 20.6506 - val_mae: 3.2053
Epoch 131/150
653/653 [==============================] - 1s 1ms/step - loss: 8.6176 - mse: 8.6176 - mae: 2.2031 - val_loss: 19.6514 - val_mse: 19.6514 - val_mae: 3.0509
Epoch 132/150
653/653 [==============================] - 1s 1ms/step - loss: 8.7041 - mse: 8.7041 - mae: 2.2017 - val_loss: 19.9885 - val_mse: 19.9885 - val_mae: 3.1129
Epoch 133/150
653/653 [==============================] - 1s 1ms/step - loss: 8.4201 - mse: 8.4201 - mae: 2.1792 - val_loss: 19.6682 - val_mse: 19.6682 - val_mae: 3.0735
Epoch 134/150
653/653 [==============================] - 1s 1ms/step - loss: 8.4058 - mse: 8.4058 - mae: 2.1821 - val_loss: 19.3724 - val_mse: 19.3724 - val_mae: 3.1028
Epoch 135/150
653/653 [==============================] - 1s 1ms/step - loss: 8.6227 - mse: 8.6227 - mae: 2.1971 - val_loss: 20.4535 - val_mse: 20.4535 - val_mae: 3.1526
Epoch 136/150
653/653 [==============================] - 1s 1ms/step - loss: 8.5839 - mse: 8.5839 - mae: 2.1833 - val_loss: 20.3361 - val_mse: 20.3361 - val_mae: 3.1640
Epoch 137/150
653/653 [==============================] - 1s 1ms/step - loss: 8.4600 - mse: 8.4600 - mae: 2.1894 - val_loss: 21.2538 - val_mse: 21.2538 - val_mae: 3.2185
Epoch 138/150
653/653 [==============================] - 1s 1ms/step - loss: 8.5381 - mse: 8.5381 - mae: 2.1944 - val_loss: 19.2235 - val_mse: 19.2235 - val_mae: 3.0862
Epoch 139/150
653/653 [==============================] - 1s 1ms/step - loss: 8.4297 - mse: 8.4297 - mae: 2.1829 - val_loss: 19.1773 - val_mse: 19.1773 - val_mae: 3.1070
Epoch 140/150
653/653 [==============================] - 1s 1ms/step - loss: 8.4853 - mse: 8.4853 - mae: 2.1849 - val_loss: 20.9601 - val_mse: 20.9601 - val_mae: 3.2037
Epoch 141/150
653/653 [==============================] - 1s 1ms/step - loss: 8.2686 - mse: 8.2686 - mae: 2.1617 - val_loss: 19.6777 - val_mse: 19.6777 - val_mae: 3.0882
Epoch 142/150
653/653 [==============================] - 1s 1ms/step - loss: 8.5493 - mse: 8.5493 - mae: 2.1774 - val_loss: 19.9995 - val_mse: 19.9995 - val_mae: 3.1042
Epoch 143/150
653/653 [==============================] - 1s 1ms/step - loss: 8.7201 - mse: 8.7201 - mae: 2.2045 - val_loss: 20.6372 - val_mse: 20.6372 - val_mae: 3.1600
Epoch 144/150
653/653 [==============================] - 1s 1ms/step - loss: 8.5227 - mse: 8.5227 - mae: 2.1888 - val_loss: 21.5377 - val_mse: 21.5377 - val_mae: 3.1945
Epoch 145/150
653/653 [==============================] - 1s 1ms/step - loss: 8.6380 - mse: 8.6380 - mae: 2.1993 - val_loss: 19.6394 - val_mse: 19.6394 - val_mae: 3.0386
Epoch 146/150
653/653 [==============================] - 1s 1ms/step - loss: 8.4410 - mse: 8.4410 - mae: 2.1706 - val_loss: 19.9963 - val_mse: 19.9963 - val_mae: 3.1084
Epoch 147/150
653/653 [==============================] - 1s 1ms/step - loss: 8.3274 - mse: 8.3274 - mae: 2.1692 - val_loss: 20.5860 - val_mse: 20.5860 - val_mae: 3.1550
Epoch 148/150
653/653 [==============================] - 1s 1ms/step - loss: 8.4102 - mse: 8.4102 - mae: 2.1756 - val_loss: 20.7254 - val_mse: 20.7254 - val_mae: 3.1713
Epoch 149/150
653/653 [==============================] - 1s 1ms/step - loss: 8.1982 - mse: 8.1982 - mae: 2.1554 - val_loss: 19.8987 - val_mse: 19.8987 - val_mae: 3.1628
Epoch 150/150
653/653 [==============================] - 1s 1ms/step - loss: 8.5560 - mse: 8.5560 - mae: 2.1832 - val_loss: 21.0024 - val_mse: 21.0024 - val_mae: 3.1961

We can now visualize the loss (MSE) for training and validation data over the epochs:

In [54]:
print(history.history.keys())
# "Loss"
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
dict_keys(['loss', 'mse', 'mae', 'val_loss', 'val_mse', 'val_mae'])
2021-03-28T13:51:56.615905 image/svg+xml Matplotlib v3.3.4, https://matplotlib.org/

And finally, we can use the model to make predictions over our test data:

In [55]:
test_predictions = model.predict(testData).flatten()

a = plt.axes(aspect='equal')
plt.scatter(testLabels, test_predictions)
plt.xlabel('True Values [Hazard]')
plt.ylabel('Predictions [Hazard]')
lims = [Y.min(), Y.max()]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)
2021-03-28T13:51:57.064759 image/svg+xml Matplotlib v3.3.4, https://matplotlib.org/

2. Deep Learning for Classification

In the example below, we use U.S. Adult Salary data set to predict whether a household’s income is above or below $50K. The data is obtained from U.S. Census and includes the following columns:

age: Age of the head of household.

workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked.

fnlwgt: continuous. Not useful for the analysis and should be excluded.

education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool.

education-num: continuous. Contains the same information in education, but represented by numbers instead of characters.

marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse.

occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces.

relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. Not usefule for the analysis and should be excluded.

race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.

sex: Female, Male.

capital-gain: continuous.

capital-loss:continuous.

hours-per-week: continuous.

native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands.

salary: Whether the household makes over $50K or not.

Missing values are noted in the data using ?. For more information about this dataset, please refer to: https://archive.ics.uci.edu/ml/datasets/adult

We go ahead and import the data. As we import the data in Pandas, we make sure we indicate the symbol for missing values. We also change the column names to make them more consistent:

In [56]:
data = pd.read_csv('data/adult.csv', na_values = '?')
data.columns = ['age','workclass','fnlwgt','education','education-num','marital-status',
               'occupation','relationship','race','sex','capital-gain','capital-loss',
               'hours-per-week','native-country','salary']
data.head(2)
Out[56]:
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country salary
0 90 NaN 77053 HS-grad 9 Widowed NaN Not-in-family White Female 0 4356 40 United-States <=50K
1 82 Private 132870 HS-grad 9 Widowed Exec-managerial Not-in-family White Female 0 4356 18 United-States <=50K

From these columns some of them are either not useful or redundant. Therefore, we remove them from our data:

In [57]:
data.drop(columns = ['fnlwgt', # Not useful
                     'education', # Redundant
                     'relationship' # Not useful
                    ], inplace = True # Overwrite data
         )

data.head(2)
Out[57]:
age workclass education-num marital-status occupation race sex capital-gain capital-loss hours-per-week native-country salary
0 90 NaN 9 Widowed NaN White Female 0 4356 40 United-States <=50K
1 82 Private 9 Widowed Exec-managerial White Female 0 4356 18 United-States <=50K

We can take a look at some descriptive information from the data:

In [58]:
data.describe(include = 'all')
Out[58]:
age workclass education-num marital-status occupation race sex capital-gain capital-loss hours-per-week native-country salary
count 32561.000000 30725 32561.000000 32561 30718 32561 32561 32561.000000 32561.000000 32561.000000 31978 32561
unique NaN 8 NaN 7 14 5 2 NaN NaN NaN 41 2
top NaN Private NaN Married-civ-spouse Prof-specialty White Male NaN NaN NaN United-States <=50K
freq NaN 22696 NaN 14976 4140 27816 21790 NaN NaN NaN 29170 24720
mean 38.581647 NaN 10.080679 NaN NaN NaN NaN 1077.648844 87.303830 40.437456 NaN NaN
std 13.640433 NaN 2.572720 NaN NaN NaN NaN 7385.292085 402.960219 12.347429 NaN NaN
min 17.000000 NaN 1.000000 NaN NaN NaN NaN 0.000000 0.000000 1.000000 NaN NaN
25% 28.000000 NaN 9.000000 NaN NaN NaN NaN 0.000000 0.000000 40.000000 NaN NaN
50% 37.000000 NaN 10.000000 NaN NaN NaN NaN 0.000000 0.000000 40.000000 NaN NaN
75% 48.000000 NaN 12.000000 NaN NaN NaN NaN 0.000000 0.000000 45.000000 NaN NaN
max 90.000000 NaN 16.000000 NaN NaN NaN NaN 99999.000000 4356.000000 99.000000 NaN NaN

We can also check what data types we have in this data:

In [59]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             32561 non-null  int64 
 1   workclass       30725 non-null  object
 2   education-num   32561 non-null  int64 
 3   marital-status  32561 non-null  object
 4   occupation      30718 non-null  object
 5   race            32561 non-null  object
 6   sex             32561 non-null  object
 7   capital-gain    32561 non-null  int64 
 8   capital-loss    32561 non-null  int64 
 9   hours-per-week  32561 non-null  int64 
 10  native-country  31978 non-null  object
 11  salary          32561 non-null  object
dtypes: int64(5), object(7)
memory usage: 3.0+ MB

We see that there are several rows with missing values. We go ahead and drop all the row with any missing values from the data:

In [60]:
data.dropna(inplace = True)
In [61]:
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 30162 entries, 1 to 32560
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             30162 non-null  int64 
 1   workclass       30162 non-null  object
 2   education-num   30162 non-null  int64 
 3   marital-status  30162 non-null  object
 4   occupation      30162 non-null  object
 5   race            30162 non-null  object
 6   sex             30162 non-null  object
 7   capital-gain    30162 non-null  int64 
 8   capital-loss    30162 non-null  int64 
 9   hours-per-week  30162 non-null  int64 
 10  native-country  30162 non-null  object
 11  salary          30162 non-null  object
dtypes: int64(5), object(7)
memory usage: 3.0+ MB

There are a few things we need to take care of before we build our model:

  • 1- Prepare the categorical variables
  • 2- Separate predictors and target variables
  • 3- Scale the predictors

To prepare the categorical variables, we use a powerful package called category-encoders:

In [62]:
encoder = BaseNEncoder(cols=['workclass', # Categorical variable to be encoded
                             'marital-status', # Categorical variable to be encoded
                             'occupation', # Categorical variable to be encoded
                             'race', # Categorical variable to be encoded
                             'sex', # Categorical variable to be encoded
                             'native-country' # Categorical variable to be encoded
                            ],
                        base = 3 # Increasing this value will create fewer variables
                      ).fit(data)

df = encoder.transform(data) # We call the transformed data df. From now on, we refer to the data as df
C:\Users\greer\AppData\Local\Programs\Python\Python38\lib\site-packages\category_encoders\utils.py:21: FutureWarning: is_categorical is deprecated and will be removed in a future version.  Use is_categorical_dtype instead
  elif pd.api.types.is_categorical(cols):
In [63]:
df.head(2)
Out[63]:
age workclass_0 workclass_1 workclass_2 education-num marital-status_0 marital-status_1 marital-status_2 occupation_0 occupation_1 ... sex_1 capital-gain capital-loss hours-per-week native-country_0 native-country_1 native-country_2 native-country_3 native-country_4 salary
1 82 0 0 1 9 0 0 1 0 0 ... 1 0 4356 18 0 0 0 0 1 <=50K
3 54 0 0 1 4 0 0 2 0 0 ... 1 0 3900 40 0 0 0 0 1 <=50K

2 rows × 26 columns

The next thing we need to do before we build our model is to separate our predictors from our target variable:

In [64]:
X = df.drop(columns = ['salary'])
feature_list = X.columns
y = np.where(df.salary=="<=50K", 0, 1) # Convert the target values to 0s (less than 50K) and 1s (more than 50K)

And finally, we scale the predictors to make them comparable:

In [65]:
scalar = MinMaxScaler() # Define the scaling method
scalar.fit(X) # Fit the scaling methods
X = scalar.transform(X) # Apply the scaling method to X and call it X again

Split the data to Train and Test:

For us to be able to build a generalizable predictive model, we need to devise an unbiased way to evaluate the model. Machine learning models can be quite complex, and it could be hard to understand how they actually learn from the data. One thing that we can be sure about is that if we use a different data to evaluate our model, we can examine to what extent they are generalizable. For instance, if we use a portion of our original data to build/ train the model, and then use another portion of the data to test/ evaluate it, we can check to what extent our model performs well on cases/ samples that it's never seen. Separating the original data to train and test samples is called the hold-out method. We essentially hold a portion of the original data out to be able to use them for testing and evaluating the model. To split the data to train and test samples, we can use scikit-learn:

In [66]:
trainData, testData, trainLabels, testLabels = train_test_split(X, 
                                                                y, 
                                                                train_size = .8, # Proportion of train samples
                                                                random_state = 1 # We set this to create reproducible results
                                                               )

Now, we can go ahead and design our network:

In [69]:
# Clear the previous model:
clear_session()

model = Sequential()
model.add(Dense(100, activation='relu'))
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Let's go ahead and compile and fit the model:

In [70]:
model.compile(loss='binary_crossentropy', optimizer='adam')
In [72]:
model.fit(trainData, trainLabels, 
          epochs=50, 
          batch_size=50, 
          verbose=1,
          validation_split=0.2)
Epoch 1/50
387/387 [==============================] - 1s 2ms/step - loss: 0.4736 - val_loss: 0.3729
Epoch 2/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3629 - val_loss: 0.3471
Epoch 3/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3475 - val_loss: 0.3577
Epoch 4/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3430 - val_loss: 0.3326
Epoch 5/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3317 - val_loss: 0.3438
Epoch 6/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3414 - val_loss: 0.3277
Epoch 7/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3259 - val_loss: 0.3313
Epoch 8/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3272 - val_loss: 0.3279
Epoch 9/50
387/387 [==============================] - 1s 1ms/step - loss: 0.3257 - val_loss: 0.3242
Epoch 10/50
387/387 [==============================] - 1s 1ms/step - loss: 0.3276 - val_loss: 0.3240
Epoch 11/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3254 - val_loss: 0.3252
Epoch 12/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3194 - val_loss: 0.3235
Epoch 13/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3152 - val_loss: 0.3212
Epoch 14/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3258 - val_loss: 0.3267
Epoch 15/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3189 - val_loss: 0.3197
Epoch 16/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3160 - val_loss: 0.3195
Epoch 17/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3162 - val_loss: 0.3227
Epoch 18/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3290 - val_loss: 0.3291
Epoch 19/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3131 - val_loss: 0.3228
Epoch 20/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3130 - val_loss: 0.3180
Epoch 21/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3095 - val_loss: 0.3204
Epoch 22/50
387/387 [==============================] - 1s 1ms/step - loss: 0.3076 - val_loss: 0.3191
Epoch 23/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3115 - val_loss: 0.3202
Epoch 24/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3155 - val_loss: 0.3215
Epoch 25/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3124 - val_loss: 0.3220
Epoch 26/50
387/387 [==============================] - 1s 1ms/step - loss: 0.3117 - val_loss: 0.3208
Epoch 27/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3087 - val_loss: 0.3222
Epoch 28/50
387/387 [==============================] - 1s 1ms/step - loss: 0.3111 - val_loss: 0.3184
Epoch 29/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3131 - val_loss: 0.3221
Epoch 30/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3050 - val_loss: 0.3220
Epoch 31/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3004 - val_loss: 0.3271
Epoch 32/50
387/387 [==============================] - 1s 1ms/step - loss: 0.3055 - val_loss: 0.3175
Epoch 33/50
387/387 [==============================] - 1s 1ms/step - loss: 0.3023 - val_loss: 0.3201
Epoch 34/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3014 - val_loss: 0.3308
Epoch 35/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3067 - val_loss: 0.3285
Epoch 36/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3036 - val_loss: 0.3219
Epoch 37/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3068 - val_loss: 0.3208
Epoch 38/50
387/387 [==============================] - 0s 1ms/step - loss: 0.2958 - val_loss: 0.3216
Epoch 39/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3002 - val_loss: 0.3225
Epoch 40/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3025 - val_loss: 0.3341
Epoch 41/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3077 - val_loss: 0.3239
Epoch 42/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3004 - val_loss: 0.3254
Epoch 43/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3018 - val_loss: 0.3231
Epoch 44/50
387/387 [==============================] - 0s 1ms/step - loss: 0.2945 - val_loss: 0.3258
Epoch 45/50
387/387 [==============================] - 1s 1ms/step - loss: 0.3020 - val_loss: 0.3224
Epoch 46/50
387/387 [==============================] - 0s 1ms/step - loss: 0.2943 - val_loss: 0.3338
Epoch 47/50
387/387 [==============================] - 0s 1ms/step - loss: 0.3010 - val_loss: 0.3376
Epoch 48/50
387/387 [==============================] - 0s 1ms/step - loss: 0.2985 - val_loss: 0.3321
Epoch 49/50
387/387 [==============================] - 0s 1ms/step - loss: 0.2935 - val_loss: 0.3277
Epoch 50/50
387/387 [==============================] - 0s 1ms/step - loss: 0.2903 - val_loss: 0.3309
Out[72]:
<tensorflow.python.keras.callbacks.History at 0x1f1131cac70>

We can obtain the predictions using the model:

In [73]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 100)               2600      
_________________________________________________________________
dense_1 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 101       
=================================================================
Total params: 12,801
Trainable params: 12,801
Non-trainable params: 0
_________________________________________________________________
In [35]:
predictions = (model.predict(testData) > 0.5).astype("int32").flatten()
predictionProbabilities = model.predict(testData).flatten()
In [36]:
# Importing function that can be used to calculate different metrics such as accuracy, precision, recall.
from sklearn.metrics import * 

def calculateMetricsAndPrint(predictions, predictionsProbabilities, actualLabels):
    accuracy = accuracy_score(actualLabels, predictions) * 100
    precisionNegative = precision_score(actualLabels, predictions, average = None)[0] * 100
    precisionPositive = precision_score(actualLabels, predictions, average = None)[1] * 100
    recallNegative = recall_score(actualLabels, predictions, average = None)[0] * 100
    recallPositive = recall_score(actualLabels, predictions, average = None)[1] * 100
    auc = roc_auc_score(actualLabels, predictionsProbabilities) * 100
    
    print("Accuracy: %.2f\nPrecisionNegative: %.2f\nPrecisionPositive: %.2f\nRecallNegative: %.2f\nRecallPositive: %.2f\nAUC Score: %.2f\n" % 
          (accuracy, precisionNegative, precisionPositive, recallNegative, recallPositive, auc))
    
calculateMetricsAndPrint(predictions, predictionProbabilities, testLabels)
Accuracy: 85.15
PrecisionNegative: 88.05
PrecisionPositive: 74.33
RecallNegative: 92.74
RecallPositive: 62.58
AUC Score: 90.62

In [37]:
def plot_roc_curve(fpr, tpr):
    plt.plot(fpr, tpr, color='orange', label='ROC')
    plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend()
    plt.show()
    
pos_probs = predictionProbabilities

fpr, tpr, thresholds = roc_curve(testLabels, pos_probs, pos_label = 1)

# calculate scores
lr_auc = roc_auc_score(testLabels, pos_probs)
print('AUC Score = %.3f' % (lr_auc * 100))
plt.rcParams['figure.figsize'] = [7, 7]
plot_roc_curve(fpr, tpr)
AUC Score = 90.616
2021-03-28T12:20:45.605348 image/svg+xml Matplotlib v3.3.4, https://matplotlib.org/
In [ ]: